Exploiting a Computation Reuse Cache to Reduce Energy in Network Processors
نویسندگان
چکیده
High end routers are targeted at providing worst case throughput guarantees over latency. Caches on the other hand are meant to help latency not throughput in a traditional processor, and provide no additional throughput for a balanced network processor design. This is why most high end routers do not use caches for their data plane algorithms. In this paper we examine how to use a cache for a balanced high bandwidth network processor. We focus on using a cache not as a latency saving mechanism, but as an energy saving device. We propose using a Computation Reuse Cache that caches the answer to a query for dataplane algorithms, where the tags are the inputs to the query and the block the result of the query. This allows the data-plane algorithm to perform a complete query in one cache access if there is a hit. This creates slack by reducing the number of instructions executed. We then exploit this slack by fetch-gating the data-plane algorithm while matching the worst case throughput guarantees of the rest of the network processor. We evaluate the computation reuse cache for network data-plane algorithms IP-lookup, Packet Classification and NAT protocol.
منابع مشابه
Visualization Enables the Programmer to Reduce Cache Misses
Many programs execution speed suffer from cache misses. These can be reduced on three different levels: the hardware level, the compiler level and the algorithm level. Much work has been done on the hardware level and the compiler level, however relatively little work has been done on assisting the programmer to increase the locality in his programs. In this paper, a method is proposed to visua...
متن کاملUltra-Low-Energy DSP Processor Design for Many-Core Parallel Applications
Background and Objectives: Digital signal processors are widely used in energy constrained applications in which battery lifetime is a critical concern. Accordingly, designing ultra-low-energy processors is a major concern. In this work and in the first step, we propose a sub-threshold DSP processor. Methods: As our baseline architecture, we use a modified version of an existing ultra-low-power...
متن کاملEvaluating the Reuse Cache for mobile processors
There has been a lot of research over the years on novel cache architectures to improve cache performance or reduce cache area. However, the focus of these efforts has been on desktop processors and compute workloads. Today, mobile processors represent a significant chunk of the market, and are becoming as complex as their desktop counterparts. Area optimizations are crucial to drive down costs...
متن کاملA predictable hardware to exploit temporal reuse in real-time and embedded systems
In this paper we propose a new hardware data cache (FAFB, fully-associative FIFO tagged buffers) to complement the data cache in processors. It provides predictability when exploiting temporal reuse in array data structures, i.e. it allows an accurate WCET analysis, which is required in real-time systems. With our hardware proposal, compiler transformations that exploit such reuse (essentially ...
متن کاملSynchronization and Pipelining on Multicore: Shaping Parallelism for a New Generation of Processors
The potential for higher performance from increasing on-chip transistor densities, on the one hand, and the limitations in instruction-level parallelism of sequential applications and in the scalability of increasingly complicated superscalar and multithreaded architectures, on the other, are leading the microprocessor industry to embrace chip multi-processors as a cost-effective solution for t...
متن کامل